Resilient Distributed Data Management Protocols in Dynamic Resource Environments

نویسنده

  • Fan Yang
چکیده

Traditional cloud systems are designed to tolerate server or rack-level failures that are uncorrelated and unpredictable. Such systems successfully deliver highly-available cloud services at global scale. However, the increasing criticality of cloud services to the overall world economy is causing concerns about the impact of power/network outages, cyber-attacks, administration errors, or other causes of datacenter or larger-scale failures on cloud availability. Recent experience shows that these events can trigger cascading failures and global-scale service outages. Moreover, the continued drive for resource use optimization also increases resource dynamism and volatility. We seek to understand the resilience problem of existing cloud systems, and create highly available data services despite daily, concurrent datacenter outages. We begin by studying the impact of concurrent datacenter outages on data service availability. We characterize the failure behavior of distributed protocols that are widely used in Cassandra [35], varying protocol configurations and resource properties. Our study reveals that using such protocols to achieve high availability under correlated, datacenter-scale outages are costly in storage and update traffic, requiring replication factors of 10 or more. Further analysis reveals that such limitation arises from inflexible replication and management strategies. We argue that such resilience problem is due to the traditional assumptions of resource failures that are no longer applicable to today’s cloud computing environments. While traditional systems assume failures to be at server or rack level and with uniform, fixed failure rates, the emerging failures we are currently facing are highly correlated, varied across datacenter resources, and with more dynamism and higher frequency. To deliver highly available data services in the face of frequent, correlated outages, we propose new protocols that take the novel approach of exploiting statistical models for datacenter resource failures. The complete modeling of failure statistics enables us to (1) exploit the failure correlation for efficient load management and data storage/movement overhead upon largescale outages, (2) reflect the differences in resource reliability across datacenters for intelligent replica placement, and (3) efficiently respond to failures and recoveries with statistics-based prediction and awareness of correlated failure domains. More specifically, our protocols adopt statistical-model-based management strategies, including dynamic replica management and dynamic quorum that can adapt to real-time resource availability changes, as well as asymmetric resource treatment. In this way, our protocols can achieve 99% service availability and high consistency with low fixed cost and outage-proportional overhead. To demonstrate the improvement of our protocols, we will prototype and implement them in the framework of Cassandra. The design, the implementation, and the evaluation of those protocols will be included in the full thesis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments

Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...

متن کامل

Integrating Machine Learning Techniques to Adapt Protocols for QoS-enabled Distributed Real-time and Embedded Publish/Subscribe Middleware

Quality-of-service (QoS)-enabled publish/subscribe (pub/sub) middleware provides the infrastructure needed to disseminate data predictably, reliably, and scalably in distributed real-time and embedded (DRE) systems. Maintaining QoS properties as the operating environment fluctuates is challenging, however, since the chosen mechanism (e.g., transport protocol or caching algorithm for data persis...

متن کامل

Grid - based Distributed Data Mining Systems , Algorithms and Services ∗

Distribution of data and computation allows for solving larger problems and execute applications that are distributed in nature. The Grid is a distributed computing infrastructure that enables coordinated resource sharing within dynamic organizations consisting of individuals, institutions, and resources. The Grid extends the distributed and parallel computing paradigms allowing resource negoti...

متن کامل

An Effective Key Management Approach to Differential Access Control in Dynamic Environments

Applications like e-newspaper or interactive online gaming have more than one resource and a large number of users. There is a many-to-many relationship between users and resources; each user can access multiple resources and multiple users can access each resource. The resources are independent and each resource needs to be encrypted by a different Resource Encryption Key (REK). Each REK needs...

متن کامل

Architecture and Algorithms for Distributed Rule Management and Processing

Many distributed computing environments are based on the paradigms of peer-to-peer networks and grid-based computing. These environments tend to be structurally dynamic but with volatile resource and service availability. Such environments have rules, constraints, and guidelines that govern how tasks are performed. The rules describe how resources, components and services should be allocated an...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018